Clamp idle runner sleep to min(base, 30s)#354
Merged
Conversation
A runner whose claim returns no work and whose last child has exited has nothing to react to except a timer wake-up: SIGCHLD never fires when running_jobs is empty. Letting the existing claim_backoff_max_secs ramp take over in that state delays workflow-complete, idle-exit, and end_time detection by up to the configured cap for no benefit. When the runner is idle, the sleep now clamps to min(job_completion_poll_interval, IDLE_BACKOFF_CAP_SECS) where IDLE_BACKOFF_CAP_SECS is a hard-coded 30s. This keeps closing-case detection responsive even when a long base interval is configured for cost reasons, while still honoring the user's preferred minimum cadence when it's tighter than 30s. The busy-at-capacity ramp is unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR adjusts the job runner’s adaptive backoff behavior so that when the runner is truly idle (no tracked child processes), its sleep interval is clamped to min(job_completion_poll_interval, 30s), keeping workflow-completion / idle-exit / end-time detection responsive even when a long base poll interval is configured.
Changes:
- Add an explicit “idle (no children)” regime with a hard cap of 30s via
idle_poll_intervaland anis_idleflag passed intonext_poll_interval. - Update the main loop’s wait selection to use the idle clamp when
running_jobsis empty, while leaving the busy-at-capacity ramp behavior unchanged. - Expand unit tests and update documentation to describe the three adaptive-backoff regimes.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
src/client/job_runner.rs |
Introduces idle clamp helper/constant, threads is_idle through backoff computation, updates wait logic and tests. |
docs/src/core/concepts/job-runners.md |
Documents adaptive backoff with separate busy vs. idle regimes and rationale/examples. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
The prior phrasing said the idle wait is "never faster than base", which is wrong when base > 30s — in that case `min(base, 30)` deliberately polls faster than base, which is the whole point. Restate the two guarantees the formula actually provides: the wait is at most 30s, and never longer than the configured base. Code behavior is unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| /// toward `cap`. The cap is clamped to at least `base` so callers cannot | ||
| /// accidentally shrink the wait below the configured floor. | ||
| fn next_poll_interval(current: f64, base: f64, cap: f64, made_progress: bool) -> f64 { | ||
| /// In the non-idle path, progress resets the wait to `base`; an idle |
Comment on lines
3715
to
3721
| #[test] | ||
| fn next_poll_interval_doubles_on_idle() { | ||
| // Empty iterations grow the wait by a factor of two until the cap. | ||
| let base = 30.0; | ||
| let cap = 300.0; | ||
| let mut current = base; | ||
| let steps = [60.0, 120.0, 240.0, 300.0, 300.0]; |
Comment on lines
3766
to
3772
| #[test] | ||
| fn next_poll_interval_idle_never_decreases() { | ||
| // An idle step from current=base must never return less than base. | ||
| let base = 30.0; | ||
| let cap = 300.0; | ||
| let next = next_poll_interval(base, base, cap, false); | ||
| let next = next_poll_interval(base, base, cap, false, false); | ||
| assert!(next >= base); |
Comment on lines
+248
to
+258
| | State | Wait | | ||
| | ----------------------------- | ---------------------------------------- | | ||
| | Making progress | `job_completion_poll_interval` (base) | | ||
| | Busy at capacity, no progress | doubles toward `claim_backoff_max_secs` | | ||
| | Idle (no children to reap) | `min(job_completion_poll_interval, 30s)` | | ||
|
|
||
| **Busy-at-capacity case (long-running workflows).** When the runner is fully loaded and nothing is | ||
| completing or being claimed, polling at the base interval would generate unnecessary requests for | ||
| hours. Each iteration with no progress doubles the wait, capped at `claim_backoff_max_secs` (default | ||
| 300s). The wait resets to base immediately on any progress: a local completion, a successful claim, | ||
| or a `SIGCHLD` wake-up. |
daniel-thom
added a commit
that referenced
this pull request
May 27, 2026
PR #354 introduced an explicit is_idle code concept ("no tracked children") but left the surrounding docs and test names using "idle" in the loose sense of "iteration with no progress". After the change those two meanings collide. - Reword next_poll_interval's doc comment to say "no-progress iteration" in the non-idle branch and explicitly note that no-progress does not imply is_idle. - Rename next_poll_interval_doubles_on_idle -> next_poll_interval_doubles_on_no_progress and next_poll_interval_idle_never_decreases -> next_poll_interval_no_progress_never_decreases; both already exercise is_idle=false, so the new names reflect what they actually cover. - In docs, replace the "Busy at capacity, no progress" table row and prose with "No progress (children still running)". The ramp engages on any no-progress iteration with running children, not just at capacity (e.g., spare slots but unmet dependencies → server returns no work → still ramps). No behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
A runner whose claim returns no work and whose last child has exited has nothing to react to except a timer wake-up: SIGCHLD never fires when running_jobs is empty. Letting the existing claim_backoff_max_secs ramp take over in that state delays workflow-complete, idle-exit, and end_time detection by up to the configured cap for no benefit.
When the runner is idle, the sleep now clamps to
min(job_completion_poll_interval, IDLE_BACKOFF_CAP_SECS) where IDLE_BACKOFF_CAP_SECS is a hard-coded 30s. This keeps closing-case detection responsive even when a long base interval is configured for cost reasons, while still honoring the user's preferred minimum cadence when it's tighter than 30s. The busy-at-capacity ramp is unchanged.